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Abstract 

The language competition model of Viviane de Oliveira et al is modified by 
associating with each language a string of 32 bits. Whenever a language changes in 
this Viviane model, also one randomly selected bit is flipped. If then only languages 
with different bit-strings are counted as different, the resulting size distribution of 
languages agrees with the empirically observed slightly asymmetric log-normal 
distribution. Several other modifications were also tried but either had more free 
parameters or agreed less well with reality. 



1 Introduction 

The competition between languages of adult humans, leading to the extinc- 
tion of some, the emergence of new and the modification of existing lan- 
guages, has been simulated recently by many physicists [1-11] and others 
[12-14], see also ^5] for the learning of languages by children. The web site 
http: //www.isrl.uiuc.edu/amag/langev/ lists 10 3 linguistic computer simula- 



tions, and recent reviews of language competition simulations were given in 
[16-18]. Perhaps the empirically best-known aspect of language competition 
is the present distribution n s of language sizes s, where the size s of the 
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Distribution of language sizes from Grimes, Ethnologue, and 550 exp[-0.05{ln(size/7000)}**2] 
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Figure 1: Empirical size distribution of the ~ 10 4 present human languages, 
binned in powers of two. The curve shows a fitted parabola, corresponding to 
a log-normal distribution. Real numbers of languages are for small languages 
higher than this parabolic fit. From [23] . 

language is defined as the number of people speaking mainly this language, 
and n s is the number of different languages spoken by s people. We leave 
it to linguists and politicians to distinguish languages from dialects and rely 
on the widely used "Ethnologue" statistics [19-22] repeated in Fig.l. This 
log-log plot shows a slightly asymmetric parabola, corresponding to a log- 
normal distribution with enhancement for small sizes s ~ 10. Our aim is to 
reproduce this empirically observed distribution in an equilibrium simulation; 
previously it was achieved only for non-equilibrium |2*H] . 

Of the many models cited above only the "Schulze" model |Fj and the "Vi- 
viane" model [H] gave thousands of languages as in reality. The Schulze model 
gave a reasonable n s distribution in non-equilibrium [23] , when observed dur- 
ing its phase transition between the dominance of one language spoken by 
most people and the fragmentation into numerous small languages. The Vi- 
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viane model does not have such a phase transition ^7] , and we now attempt 
to get from it a realistic n s in equilibrium. 

The next section defines the standard Viviane model jH] for the reader's 
convenience. Section 3 gives our bit-string modification and the improved re- 
sulting n s , while Section 4 lists other attempts to get a good size distribution. 
The concluding section 5 compares our various attempts. 

2 Viviane Model 

The original Viviane model jH] simulates the spread of humans over a pre- 
viously uninhabited continent. Each site j of an L x L square lattice can 
later be populated by Cj people, where Cj is initially fixed randomly between 
1 and m ~ 10 2 . On a populated site only one language is spoken. Initially 
only one single site i is occupied by q people. 

Then as in Eden cluster growth or Leath percolation algorithm, at each 
time step one surface site (= empty neighbour j of the set of all occupied sites) 
is selected randomly, and then occupied with probability Cj/m by Cj people. 
These settlers first select as language that of one of their occupied neighbour 
sites, with a probability proportional to the fitness of that language. This 
fitness Fk is the total number of people speaking the language k of that site, 
summed over all lattice sites occupied at that time. (In 0, this fitness was 
bounded from above by a maximum selected randomly between 1 and 
M mSuX ~ 20m.) After a language is selected, it is mutated into a new language 
with probability a/Fk with a mutation factor a typically between 10 -3 and 
1. From then on the population and language of the just occupied lattice site 
remain constant. Equilibrium is reached when all lattice sites have become 
occupied and the simulation stops. As a result of this algorithm, the various 
languages are numbered 1, 2, 3, ... without any internal structure of the 
languages. 

The resulting language size distribution n s in Fig. 2 has a sharp maximum 
near s ~ m, and follows one power law (exponent 1) to the left of the 
maximum and another power law to its right. As in reality it extends from 
s = 1 to s = 10 9 for the number s of people speaking one language. But the 
sharp maximum is not seen in reality, Fig.l, and the simulated slope on the 
right of the maximum is weaker than the one at its left, while reality shows 
the opposite asymmetry: Less slope on the left than on the right. 

With increasing mutation factor a, the fraction of people speaking the 
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50 samples 1 0001 x 1 0001 , alpha = 0.002 
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Figure 2: Language size distribution n s for the standard Viviane model, with 
s varying from 1 to 10 9 . The absolute value of the slope to the right is smaller 
than one on the left, in contrast to reality, Fig.l. m = 127, M max = 16m, 
also in Figs. 3, 5, 6, 8. 



largest language decreases smoothly, Fig. 3, without showing a sharp phase 
transition (in contrast to the Schulze model). For increasing lattice size L 
the curves shift slightly (logarithmically ?) to smaller a values. 

(The program listed in [Tj\ gave a limiting fitness Mj to each site j, 
instead of an to each language. Thus before the mutations are simulated 
we need there the line f (lang(j ) )=min(limit (lang( j ) ) , f(lang(j)) + 
c(j)*fac). This mistake barely affects the n s , Fig. 2, but after correction 
the resulting size effect in our Fig.3 is weaker than in Fig.3 of |17j.) 

3 Bit-string modification 

We now improve the Viviane model in three ways: 

i) We give the Viviane languages an internal structure by associating with 



4 



100 samples, L = 256, 512, 1024, 2048, 4096, 8192 




Figure 3: Variation of the fraction of people speaking the largest language. 
The linear lattice size L increases from right to left. For mutation factor 
a = by definition everybody speaks the language of the initially occupied 
site. 



each language a string of, say, I = 16 bits, initially all set to zero. At each 
mutation of the language at the newly occupied site, one randomly selected 
bit is flipped, from to 1 or from 1 to 0. We count languages as different 
only if they have different bit-strings. Otherwise the standard algorithm is 
unchanged. Thus our new bit-strings do not influence the dynamics of the 
population spread, only the counting of languages. 

ii) Thus far the populations Cj per site j were homogeneously distributed 
between 1 and to. In reality, there are more bad than good sites for human 
settlement. We approximate this effect by assuming that the values of c, 
to be scattered between 1 and to, no longer are distributed with a constant 
probability but with a probability proportional to 1/c. 

iii) Instead of occupying one randomly selected surface site i with prob- 
ability proportional to q, we saved lots of computer time by selecting two 
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such surface sites and occupying the one with the bigger c. 

(As a minor improvement we counted a neighbour language only once if 
two or more neighbours of the just occupied site speak that language.) 

1000 r 1 1 1 1 1 1 1 1 n 
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Speakers = 10,038,473,698 
m = 250 
Mmax = 300 
alpha = 0.05 
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Figure 4: Language size distribution for bit-string version, L = 15000, m = 
250, M max = 300, a = 0.05, £ = 14 bits. 

Fig. 4 shows that these modifications are good enough to result in reason- 
able agreement with reality, Fig.l. The shape of the curve is robust against 
a wide variation of the parameters. We do not show plots for different m 
since 1 < Fj < m and for fixed m/M max the simulations depend only on the 
ratio oc/Fj. The total number of languages is only 5 x 10 3 , less then the real 
[T§] value 7 x 10 3 for which we would need bigger lattices than our computer 
memory can store. 

As in |21j for the Schulze model, the bit-strings allow a study of spatial 
correlations: What is the Hamming distance for languages separated by a 
distance r? The Hamming distance for two bit-strings, used already in [23 
24 for the Schulze model, is the number of bits which differ from each other 
in a position-by-position comparison of the two bit-strings. Thus initially we 
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occupy the top line of the Lx L lattice with L different languages, all having 
bit-string zero, then start the standard Viviane dynamics, and at the end we 
sum over all Hamming distances of all sites on lattice line r, compared with 
the corresponding sites on the first lattice line. (By definition, this Hamming 
distance is zero for r = 1.) Fig. 5 shows our correlation functions, similar to 
reality [2UI2E]; the higher the mutation factor a, the higher the Hamming 
distance. This simulation for Fig. 5 used only modification i) and involved 
no counting of languages. 
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Figure 5: Summed Hamming distance versus geometric distance. Upper part: 
increase with increasing mutation factor, with the straight line on top giving 
the limit of uncorrelated bit-string. Lower part: variation with the length I 
of our bit-string, taken as t = 32 in the upper part. 
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4 Other modifications 



4.1 Noise 

Ref. [231 improved the language size distribution of the Schulze model by 
applying random multiplicative noise, that means by multiplying at the end 
of one simulation each n s repeatedly by a random number taken between 0.9 
and 1.1. This modification approximates external influences from outside the 
basic model. Such noise is applied in Fig. 6 to the standard Viviane model 
with the additional modification of correlations: each random number is used 
twice, one after the other. Here we multiplied each n s thousand times by a 
factor (0.9 + 0.2z) 2 at each iteration, and we summed over thousand samples. 
(Here z is a random number homogeneously distributed between and 1.) 
We start the simulations with a small mutation factor a = 0.001 and for 
each iteration this grows linearly until it reaches a values of a = 0.916, for 
all lattices sizes used here: L = 257, 513, 1023, 2047 and 4095. Fig.6 shows 
a slightly asymmetric parabola, but as in Fig. 2 with the wrong asymmetry: 
Too slow decay on the right. 

4.2 Power law for populations per site 

Using only modification ii) of section 3, and adding random multiplicative 
noise (100 multiplications with 0.9 + 0.2z, without correlations), Fig.7 now 
shows reasonable asymmetric parabolas for equilibrium, similar to for 
the non-equilibrium Schulze model. 

4.3 Indigenous population 

We modified the standard Viviane model by assuming that initially the lattice 
is not empty but is occupied by a native population which in our simulation 
is then overrun by some foreign invaders. Thus initially each lattice site gets 
a native fitness 1/z where z is a random number homogeneously distributed 
between zero and one. In the later conquest by the foreign invaders, this site 
is conquered only if the fitness of the invader is larger than the native fitness 
(minus 10). It is possible that a few sites cannot be conquered, since they 
are defended by Asterix, Obelix or other powerful natives. 

We found that this modification barely changes the final distribution of 
language sizes. For various mutation factors a, Fig. 8 shows that again we 
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L = 256, 51 3, 1 023, 2047, 4095 from bottom to top 
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Figure 6: Language size distribution from multiplicative noise and varying 
mutation factor (Viviane model without bit-strings). 

have two power laws (straight lines in this log-log plot) for small and for 
large language sizes. The time after which the "conquistadores" finish their 
conquest varies very little from sample to sample (not shown). Adding as 
before random multiplicative noise by 100 multiplications by 0.9 + 0.22; makes 
the maximum more smooth (not shown), but still with the wrong asymmetry. 



5 Conclusion 

While we have offered various modifications in order to improve the results 
from the standard Viviane model, we think the one of section 3 is the best 
since it is simple and introduced no new free parameters except t. We have 
seen a reasonable agreement with the slightly asymmetric log-normal dis- 
tribution of language sizes. Future work could replace the bits by integer 
variables between 1 and Q as in some Schulze models ^7j, or look at lan- 
guage families [2*7j . 
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L = 1 000 (+) and 1 0000 (x), 50 or 4 samples 
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Figure 7: Language size distribution with power law distribution for the Cj 
and random multiplicative noise; m = 8192, M max = 16m (Viviane model 
without bit-strings). 
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